Welcome to P K Kelkar Library, Online Public Access Catalogue (OPAC)

Normal view MARC view ISBD view

Exploiting the power of group differences : : using patterns to solve data analysis problems /

By: Dong, Guozhu 1957-, [author.].
Material type: materialTypeLabelBookSeries: Synthesis digital library of engineering and computer science: ; Synthesis lectures on data mining and knowledge discovery: # 16.Publisher: [San Rafael, California] : Morgan & Claypool, 2019.Description: 1 PDF (xv, 130 pages) : illustrations.Content type: text Media type: electronic Carrier type: online resourceISBN: 9781681735030.Subject(s): Group theory | Pattern perception -- Data processing | Quantitative research | Data mining | Machine learning | intrusion detection | compound selection | complex disease analysis | extreme instance selection | factor ranking | prediction model analysis | group difference analysis | feature | multifactor interaction | diverse relationship | heterogeneity | boosting | ensemble | association rule | emerging pattern | contrast pattern | frequent pattern | distance metric | interpretability | data mining | machine learning | data analytic | classification | regression | clustering | anomaly detection | outlier detectionDDC classification: 512.2 Online resources: Abstract with links to resource Also available in print.
Contents:
1. Introduction and overview -- 1.1 Importance of group differences -- 1.2 Summary of chapters -- 1.2.1 Reading order of the chapters -- 1.3 Known uses of group differences via emerging patterns -- 1.4 Unique properties of emerging pattern based methods -- 1.5 Scenarios where emerging patterns are especially useful -- 1.6 Related topics not covered in this book --
2. General preliminaries -- 2.1 Attributes, features, and variables -- 2.2 Data instances and datasets -- 2.3 Attribute binning and discretization -- 2.4 Patterns, matching datasets, supports, and frequent patterns -- 2.5 Equivalence classes, closed patterns, minimal generators, and borders -- 2.6 Illustrating examples --
3. Emerging patterns and a flexible mining algorithm -- 3.1 Setting for group difference analysis -- 3.2 Basics of emerging patterns -- 3.3 BorderDiff: a simple, flexible emerging pattern mining algorithm -- 3.4 What emerging patterns can represent -- 3.5 Comparison with association rules, confidence, and odds ratio -- 3.6 Pointers to sections illustrating uses of emerging patterns -- 3.7 Traditional analysis of group differences -- 3.8 Discussion of related issues --
4. CAEP: classification by aggregating multiple matching emerging patterns -- 4.1 Background materials on classification -- 4.2 The CAEP approach -- 4.2.1 CAEP's class-likelihood computation -- 4.2.2 CAEP's likelihood normalization -- 4.2.3 Emerging pattern set selection -- 4.2.4 The CAEP training and testing algorithms -- 4.3 A small illustrating example -- 4.4 Experiments and applications by other researchers -- 4.5 Strengths and uniqueness of CAEP -- 4.5.1 Strengths of CAEP -- 4.5.2 Uniqueness of CAEP -- 4.6 DeEPs: instance-based classification using emerging patterns -- 4.7 Relationship with other rule/pattern-based classifiers -- 4.8 Discussion --
5. CAEP for classification on tiny training datasets, compound selection, and instance selection -- 5.1 CAEP performs well on tiny training data -- 5.1.1 Details on data used for compound selection -- 5.2 Using CAEP for compound selection -- 5.3 Iterative algorithm for extreme instance selection -- 5.4 Semi-supervised extreme instance selection vs. semi-supervised learning --
6. OCLEP: one-class intrusion detection and anomaly detection -- 6.1 Background on intrusion detection, anomaly detection, and outlier detection -- 6.2 OCLEP: emerging pattern length-based intrusion detection -- 6.2.1 An observation on emerging pattern's length -- 6.2.2 What emerging patterns to use and their mining -- 6.2.3 OCLEP's training and testing algorithms -- 6.3 Experimental evaluation of OCLEP -- 6.3.1 Details of the NSL-KDD dataset -- 6.3.2 Intrusion detection on the NSL-KDD dataset -- 6.3.3 Masquerader detection on command sequences -- 6.4 Discussion --
7. CPCQ: contrast pattern based clustering-quality evaluation -- 7.1 Background on clustering-quality evaluation -- 7.2 CPCQ's rationale -- 7.3 Measuring quality of CPs -- 7.4 Measuring diversity of high-quality CPs -- 7.5 Defining CPCQ -- 7.6 Mining CPs and computing the best N groups of CPs to maximize CPCQ values -- 7.7 Experimental evaluation of CPCQ -- 7.8 Discussion --
8. CPC: pattern-based clustering maximizing CPCQ -- 8.1 Notations -- 8.2 Background on clustering and clustering evaluation -- 8.3 Problem setting and guiding ideas for CPC -- 8.4 Main technical measures -- 8.4.1 MPQ between two patterns -- 8.4.2 MPQ between a pattern and a pattern set -- 8.5 The CPC algorithm -- 8.6 General experimental evaluation of CPC -- 8.7 Text data analysis on blogs using CPC -- 8.8 Discussion --
9. IBIG : ranking genes and attributes for complex diseases and complex problems -- 9.1 Basics of the gene-ranking problem -- 9.2 Background on complex diseases -- 9.3 Capturing interactions using jumping emerging patterns -- 9.4 The IBIG approach -- 9.4.1 High-level view of the IBIG approach -- 9.4.2 IBIG gene ranking based on a set of emerging patterns -- 9.4.3 Gene clubs and computing gene clubs -- 9.4.4 The iterative IBIG algorithm: IBIGi -- 9.5 Experimental findings on IBIG on colon cancer data -- 9.5.1 High-quality JEPs often involve lowly IG-ranked genes and IBIGi can find many of them -- 9.5.2 Significant gene-rank differences between IG and IBIG -- 9.6 Discussion --
10. CPXR and CPXC: pattern aided prediction modeling and prediction model analysis -- 10.1 Background materials -- 10.2 Pattern aided prediction models -- 10.2.1 Fitting local models for logical subpopulations -- 10.2.2 Pattern aided prediction models -- 10.3 CPXP: contrast pattern aided prediction -- 10.4 Relationship with boosting and ensemble member selection -- 10.5 Diverse predictor-response relationships -- 10.6 Uses of CPXR and CPXC in experiments -- 10.6.1 Experiments on commonly used datasets -- 10.6.2 Applications for agriculture and healthcare predictions -- 10.7 Subpopulationwise conditional correlation analysis -- 10.8 Discussion --
11. Other approaches and applications using emerging patterns -- 11.1 Compound activity analysis -- 11.2 Structure-activity relationship exploration and analysis -- 11.3 Metabolite biomarker discovery -- 11.4 Structural alerts for molecular toxicity -- 11.5 Identifying disease subtypes, and disease treatment planning -- 11.6 Safety and street crime analysis -- 11.7 Characterizing music families -- 11.8 Identifying interaction terms: adverse drug reaction analysis -- 11.9 Coupled hidden Markov model for critical patient care -- 11.10 Pose-based human activity recognition -- 11.11 Protein complex detection -- 11.12 Inhibitor prediction combining FCA and JEP -- 11.13 Instant activity recognition in video sequences -- 11.14 Birth defect detection -- 11.15 Surgery stage identification and feedback delivery -- 11.16 Sensor-based activity recognition -- 11.17 Online banking fraud detection -- 11.18 Other EP-based classification approaches and studies -- 11.19 Emerging patterns for classification over streaming data -- 11.20 Other studies and applications -- 11.21 Summary of uses: application domain perspective -- 11.22 Discussion --
Bibliography -- Author's biography -- Index.
Abstract: This book presents pattern-based problem-solving methods for a variety of machine learning and data analysis problems. The methods are all based on techniques that exploit the power of group differences. They make use of group differences represented using emerging patterns (aka contrast patterns), which are patterns that match significantly different numbers of instances in different data groups. A large number of applications outside of the computing discipline are also included. Emerging patterns (EPs) are useful in many ways. EPs can be used as features, as simple classifiers, as subpopulation signatures/characterizations, and as triggering conditions for alerts. EPs can be used in gene ranking for complex diseases since they capture multi-factor interactions. The length of EPs can be used to detect anomalies, outliers, and novelties. Emerging/contrast pattern-based methods for clustering analysis and outlier detection do not need distance metrics, avoiding pitfalls of the latter in exploratory analysis of high dimensional data. EP-based classifiers can achieve good accuracy even when the training datasets are tiny, making them useful for exploratory compound selection in drug design. EPs can serve as opportunities in opportunity-focused boosting and are useful for constructing powerful conditional ensembles. EP-based methods often produce interpretable models and results. In general, EPs are useful for classification, clustering, outlier detection, gene ranking for complex diseases, prediction model analysis and improvement, and so on. EPs are useful for many tasks because they represent group differences, which have extraordinary power. Moreover, EPs represent multi-factor interactions, whose effective handling is of vital importance and is a major challenge in many disciplines. Based on the results presented in this book, one can clearly say that patterns are useful, especially when they are linked to issues of interest. We believe that many effective ways to exploit group differences' power still remain to be discovered. Hopefully this book will inspire readers to discover such new ways, besides showing them existing ways, to solve various challenging problems.
    average rating: 0.0 (0 votes)
Item type Current location Call number Status Date due Barcode Item holds
E books E books PK Kelkar Library, IIT Kanpur
Available EBKE846
Total holds: 0

Mode of access: World Wide Web.

System requirements: Adobe Acrobat Reader.

Part of: Synthesis digital library of engineering and computer science.

Includes bibliographical references (pages 101-123) and index.

1. Introduction and overview -- 1.1 Importance of group differences -- 1.2 Summary of chapters -- 1.2.1 Reading order of the chapters -- 1.3 Known uses of group differences via emerging patterns -- 1.4 Unique properties of emerging pattern based methods -- 1.5 Scenarios where emerging patterns are especially useful -- 1.6 Related topics not covered in this book --

2. General preliminaries -- 2.1 Attributes, features, and variables -- 2.2 Data instances and datasets -- 2.3 Attribute binning and discretization -- 2.4 Patterns, matching datasets, supports, and frequent patterns -- 2.5 Equivalence classes, closed patterns, minimal generators, and borders -- 2.6 Illustrating examples --

3. Emerging patterns and a flexible mining algorithm -- 3.1 Setting for group difference analysis -- 3.2 Basics of emerging patterns -- 3.3 BorderDiff: a simple, flexible emerging pattern mining algorithm -- 3.4 What emerging patterns can represent -- 3.5 Comparison with association rules, confidence, and odds ratio -- 3.6 Pointers to sections illustrating uses of emerging patterns -- 3.7 Traditional analysis of group differences -- 3.8 Discussion of related issues --

4. CAEP: classification by aggregating multiple matching emerging patterns -- 4.1 Background materials on classification -- 4.2 The CAEP approach -- 4.2.1 CAEP's class-likelihood computation -- 4.2.2 CAEP's likelihood normalization -- 4.2.3 Emerging pattern set selection -- 4.2.4 The CAEP training and testing algorithms -- 4.3 A small illustrating example -- 4.4 Experiments and applications by other researchers -- 4.5 Strengths and uniqueness of CAEP -- 4.5.1 Strengths of CAEP -- 4.5.2 Uniqueness of CAEP -- 4.6 DeEPs: instance-based classification using emerging patterns -- 4.7 Relationship with other rule/pattern-based classifiers -- 4.8 Discussion --

5. CAEP for classification on tiny training datasets, compound selection, and instance selection -- 5.1 CAEP performs well on tiny training data -- 5.1.1 Details on data used for compound selection -- 5.2 Using CAEP for compound selection -- 5.3 Iterative algorithm for extreme instance selection -- 5.4 Semi-supervised extreme instance selection vs. semi-supervised learning --

6. OCLEP: one-class intrusion detection and anomaly detection -- 6.1 Background on intrusion detection, anomaly detection, and outlier detection -- 6.2 OCLEP: emerging pattern length-based intrusion detection -- 6.2.1 An observation on emerging pattern's length -- 6.2.2 What emerging patterns to use and their mining -- 6.2.3 OCLEP's training and testing algorithms -- 6.3 Experimental evaluation of OCLEP -- 6.3.1 Details of the NSL-KDD dataset -- 6.3.2 Intrusion detection on the NSL-KDD dataset -- 6.3.3 Masquerader detection on command sequences -- 6.4 Discussion --

7. CPCQ: contrast pattern based clustering-quality evaluation -- 7.1 Background on clustering-quality evaluation -- 7.2 CPCQ's rationale -- 7.3 Measuring quality of CPs -- 7.4 Measuring diversity of high-quality CPs -- 7.5 Defining CPCQ -- 7.6 Mining CPs and computing the best N groups of CPs to maximize CPCQ values -- 7.7 Experimental evaluation of CPCQ -- 7.8 Discussion --

8. CPC: pattern-based clustering maximizing CPCQ -- 8.1 Notations -- 8.2 Background on clustering and clustering evaluation -- 8.3 Problem setting and guiding ideas for CPC -- 8.4 Main technical measures -- 8.4.1 MPQ between two patterns -- 8.4.2 MPQ between a pattern and a pattern set -- 8.5 The CPC algorithm -- 8.6 General experimental evaluation of CPC -- 8.7 Text data analysis on blogs using CPC -- 8.8 Discussion --

9. IBIG : ranking genes and attributes for complex diseases and complex problems -- 9.1 Basics of the gene-ranking problem -- 9.2 Background on complex diseases -- 9.3 Capturing interactions using jumping emerging patterns -- 9.4 The IBIG approach -- 9.4.1 High-level view of the IBIG approach -- 9.4.2 IBIG gene ranking based on a set of emerging patterns -- 9.4.3 Gene clubs and computing gene clubs -- 9.4.4 The iterative IBIG algorithm: IBIGi -- 9.5 Experimental findings on IBIG on colon cancer data -- 9.5.1 High-quality JEPs often involve lowly IG-ranked genes and IBIGi can find many of them -- 9.5.2 Significant gene-rank differences between IG and IBIG -- 9.6 Discussion --

10. CPXR and CPXC: pattern aided prediction modeling and prediction model analysis -- 10.1 Background materials -- 10.2 Pattern aided prediction models -- 10.2.1 Fitting local models for logical subpopulations -- 10.2.2 Pattern aided prediction models -- 10.3 CPXP: contrast pattern aided prediction -- 10.4 Relationship with boosting and ensemble member selection -- 10.5 Diverse predictor-response relationships -- 10.6 Uses of CPXR and CPXC in experiments -- 10.6.1 Experiments on commonly used datasets -- 10.6.2 Applications for agriculture and healthcare predictions -- 10.7 Subpopulationwise conditional correlation analysis -- 10.8 Discussion --

11. Other approaches and applications using emerging patterns -- 11.1 Compound activity analysis -- 11.2 Structure-activity relationship exploration and analysis -- 11.3 Metabolite biomarker discovery -- 11.4 Structural alerts for molecular toxicity -- 11.5 Identifying disease subtypes, and disease treatment planning -- 11.6 Safety and street crime analysis -- 11.7 Characterizing music families -- 11.8 Identifying interaction terms: adverse drug reaction analysis -- 11.9 Coupled hidden Markov model for critical patient care -- 11.10 Pose-based human activity recognition -- 11.11 Protein complex detection -- 11.12 Inhibitor prediction combining FCA and JEP -- 11.13 Instant activity recognition in video sequences -- 11.14 Birth defect detection -- 11.15 Surgery stage identification and feedback delivery -- 11.16 Sensor-based activity recognition -- 11.17 Online banking fraud detection -- 11.18 Other EP-based classification approaches and studies -- 11.19 Emerging patterns for classification over streaming data -- 11.20 Other studies and applications -- 11.21 Summary of uses: application domain perspective -- 11.22 Discussion --

Bibliography -- Author's biography -- Index.

Abstract freely available; full-text restricted to subscribers or individual document purchasers.

Compendex

INSPEC

Google scholar

Google book search

This book presents pattern-based problem-solving methods for a variety of machine learning and data analysis problems. The methods are all based on techniques that exploit the power of group differences. They make use of group differences represented using emerging patterns (aka contrast patterns), which are patterns that match significantly different numbers of instances in different data groups. A large number of applications outside of the computing discipline are also included. Emerging patterns (EPs) are useful in many ways. EPs can be used as features, as simple classifiers, as subpopulation signatures/characterizations, and as triggering conditions for alerts. EPs can be used in gene ranking for complex diseases since they capture multi-factor interactions. The length of EPs can be used to detect anomalies, outliers, and novelties. Emerging/contrast pattern-based methods for clustering analysis and outlier detection do not need distance metrics, avoiding pitfalls of the latter in exploratory analysis of high dimensional data. EP-based classifiers can achieve good accuracy even when the training datasets are tiny, making them useful for exploratory compound selection in drug design. EPs can serve as opportunities in opportunity-focused boosting and are useful for constructing powerful conditional ensembles. EP-based methods often produce interpretable models and results. In general, EPs are useful for classification, clustering, outlier detection, gene ranking for complex diseases, prediction model analysis and improvement, and so on. EPs are useful for many tasks because they represent group differences, which have extraordinary power. Moreover, EPs represent multi-factor interactions, whose effective handling is of vital importance and is a major challenge in many disciplines. Based on the results presented in this book, one can clearly say that patterns are useful, especially when they are linked to issues of interest. We believe that many effective ways to exploit group differences' power still remain to be discovered. Hopefully this book will inspire readers to discover such new ways, besides showing them existing ways, to solve various challenging problems.

Also available in print.

Title from PDF title page (viewed on February 27, 2019).

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha